Corral Framework: Trustworthy and Fully Functional Data Intensive Parallel Astronomical Pipelines
نویسندگان
چکیده
Data processing pipelines are one of most common astronomical software. This kind of programs are chains of processes that transform raw data into valuable information. In this work a Python framework for astronomical pipeline generation is presented. It features a design pattern (Model-View-Controller) on top of a SQL Relational Database capable of handling custom data models, processing stages, and result communication alerts, as well as producing automatic quality and structural measurements. This pattern provides separation of concerns between the user logic and data models and the processing flow inside the pipeline, delivering for free multi processing and distributed computing capabilities. For the astronomical community this means an improvement on previous data processing pipelines, by avoiding the programmer deal with the processing flow, and parallelization issues, and by making him focusing just in the algorithms involved in the successive data transformations. This software as well as working examples of pipelines are available to the community at https://github.com/toros-astro.
منابع مشابه
Parallelizing XML data-streaming workflows via MapReduce
In prior work it has been shown that the design of scientific workflows can benefit from a collection-oriented modeling paradigm which views scientific workflows as pipelines of XML stream processors. In this paper, we present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the Map-Reduce framework. Pipelines in our approach consist...
متن کاملParallelizing XML Processing Pipelines via MapReduce
We present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the MapReduce framework. Pipelines in our approach consist of sequences of processing steps that consume XML-structured data and produce, often through calls to “black-box” functions, modified (i.e., updated) XML structures. Our main contributions are a set of strategies for...
متن کاملUsing Fuzzy Logic for Automatic Analysis of Astronomical Pipelines
Fundamental astronomical questions on the composition of the universe, the abundance of Earth-like planets, and the cause of the brightest explosions in the universe are being attacked by robotic telescopes costing billions of dollars and returning vast pipelines of data. The success of these programs depends on the accuracy of automated real time processing of the astronomical images. In this ...
متن کاملOpenCluster: A Flexible Distributed Computing Framework for Astronomical Data Processing
The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore present OpenCluster, an open-source distributed computing framework to support rapidly developin...
متن کاملData-Intensive Computing Infrastructure Systems for Unmodified Biological Data Analysis Pipelines
Biological data analysis is typically implemented using a deep pipeline that combines a wide array of tools and databases. These pipelines must scale to very large datasets, and consequently require parallel and distributed computing. It is therefore important to choose a hardware platform and underlying data management and processing systems well suited for processing large datasets. There are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1701.05566 شماره
صفحات -
تاریخ انتشار 2017